Computing exact P-values for DNA motifs
نویسندگان
چکیده
MOTIVATION Many heuristic algorithms have been designed to approximate P-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true P-value by orders of magnitude. Exact P-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the problem is unknown. RESULTS We show the problem to be NP-hard, and present MotifRank, software based on dynamic programming, to calculate exact P-values of motifs. We define the exact P-value on a general and more precise model. Asymptotically, MotifRank is faster than the best exact P-value computing algorithm, and is in fact practical. Our experiments clearly demonstrate that MotifRank significantly improves the accuracy of existing approximation algorithms. AVAILABILITY MotifRank is available from http://bio.dlg.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Computing exact p - values for DNA motifs ( Part I )
Motivation: Many heuristic algorithms have been designed to approximate p-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true p-value by orders of magnitude. Exact p-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the
متن کاملA bi-level linear programming problem for computing the nadir point in MOLP
Computing the exact ideal and nadir criterion values is a very important subject in multi-objective linear programming (MOLP) problems. In fact, these values define the ideal and nadir points as lower and upper bounds on the nondominated points. Whereas determining the ideal point is an easy work, because it is equivalent to optimize a convex function (linear function) over a con...
متن کاملComputing Exact p-Value for Structured Motif
Extracting motifs from a set of DNA sequences is important in computational biology. Occurrence probability is a common used statistics to evaluate the statistical significance of a motif. A main problem is how to calculate the occurrence probability of the motif on the random model of DNA sequence efficiently and accurately. In this paper, we are interested in a particular motif model which is...
متن کاملP-value-based regulatory motif discovery using positional weight matrices.
To analyze gene regulatory networks, the sequence-dependent DNA/RNA binding affinities of proteins and noncoding RNAs are crucial. Often, these are deduced from sets of sequences enriched in factor binding sites. Two classes of computational approaches exist. The first describe binding motifs by sequence patterns and search the patterns with highest statistical significance for enrichment. The ...
متن کاملEfficient representation and P-value computation for high-order Markov motifs
MOTIVATION Position weight matrices (PWMs) have become a standard for representing biological sequence motifs. Their relative simplicity has favoured the development of efficient algorithms for diverse tasks such as motif identification, sequence scanning and statistical significance evaluation. Markov chainbased models generalize the PWM model by allowing for interposition dependencies to be c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 5 شماره
صفحات -
تاریخ انتشار 2007